An Empirical Method Exploring a Large Set of Features for Authorship Identification
نویسندگان
چکیده
In this paper, we deal with the author identification issues of the document whose origin is unknown. To overcome these problems, we propose a new hybrid approach combining the statistical and stylistic analysis. Our introduced method is based on determining the lexical and syntactic features of the written text in order to identify the author of the document. These features are explored to build a machine learning process. We obtained promising results by relying on PAN@CLEF2014 English literature corpus. The experimental results are comparable to those obtained by the best state of the art methods.
منابع مشابه
Evaluation of Rough Set Theory for Decision Making of rehabilitation Method for Concrete Pavement
In recent years a great number of advanced theoretical - empirical methods has been developed for design & modeling concrete pavements distress. But there is no reliable theoretical method to be use in evaluation of conerete pavements distresses and making a decision about repairing them. Only empirical methods is used for this reason. One of the most usual methods in evaluating concrete paveme...
متن کاملCEAI: CCM based Email Authorship Identification Model
In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuatio...
متن کاملA General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملAn Analysis Framework for Hybrid Authorship Verification
Given a set of candidate authors for whom some texts of undisputed authorship exist, attribute texts of unknown authorship to one of the candidates is called Author verification. This problem acquired great attention due to its new applications in forensic analysis, e-commerce and plagiarism detection. The author verification task is of great help in the plagiarism detection process. Indeed, th...
متن کاملModified signed log-likelihood test for the coefficient of variation of an inverse Gaussian population
In this paper, we consider the problem of two sided hypothesis testing for the parameter of coefficient of variation of an inverse Gaussian population. An approach used here is the modified signed log-likelihood ratio (MSLR) method which is the modification of traditional signed log-likelihood ratio test. Previous works show that this proposed method has third-order accuracy whereas the traditi...
متن کامل